7 research outputs found

    A Human-Centric Approach For Binary Code Decompilation

    Get PDF
    Many security techniques have been developed both in academia and industry to analyze source code, including methods to discover bugs, apply taint tracking, or find vulnerabilities. These source-based techniques leverage the wealth of high-level abstractions available in the source code to achieve good precision and efficiency. Unfortunately, these methods cannot be applied directly on binary code which lacks such abstractions. In security, there are many scenarios where analysts only have access to the compiled version of a program. When compiled, all high-level abstractions, such as variables, types, and functions, are removed from the final version of the program that security analysts have access to. This dissertation investigates novel methods to recover abstractions from binary code. First, a novel pattern-independent control flow structuring algorithm is presented to recover high-level control-flow abstractions from binary code. Unlike existing structural analysis algorithms which produce unstructured code with many goto statements, our algorithm produces fully-structured goto-free decompiled code. We implemented this algorithm in a decompiler called DREAM. Second, we develop three categories of code optimizations in order to simplify the decompiled code and increase readability. These categories are expression simplification, control-flow simplification and semantics-aware naming. We have implemented our usability extensions on top of DREAM and call this extended version DREAM++. We conducted the first user study to evaluate the quality of decompilers for malware analysis. We have chosen malware since it represents one of the most challenging cases for binary code analysis. The study included six reverse engineering tasks of real malware samples that we obtained from independent malware experts. We evaluated three decompilers: the leading industry decompiler Hex-Rays and both versions of our decompiler DREAM and DREAM++. The results of our study show that our improved decompiler DREAM++ produced significantly more understandable code that outperforms both Hex-Rays and DREAM. Using DREAM++participants solved 3 times more tasks than when using Hex-Rays and 2 times more tasks than when using DREAM. Moreover, participants rated DREAM++ significantly higher than the competition

    No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantics-Preserving Transformations

    No full text
    Abstract-Decompilation is important for many security applications; it facilitates the tedious task of manual malware reverse engineering and enables the use of source-based security tools on binary code. This includes tools to find vulnerabilities, discover bugs, and perform taint tracking. Recovering high-level control constructs is essential for decompilation in order to produce structured code that is suitable for human analysts and sourcebased program analysis techniques. State-of-the-art decompilers rely on structural analysis, a pattern-matching approach over the control flow graph, to recover control constructs from binary code. Whenever no match is found, they generate goto statements and thus produce unstructured decompiled output. Those statements are problematic because they make decompiled code harder to understand and less suitable for program analysis. In this paper, we present DREAM, the first decompiler to offer a goto-free output. DREAM uses a novel patternindependent control-flow structuring algorithm that can recover all control constructs in binary programs and produce structured decompiled code without any goto statement. We also present semantics-preserving transformations that can transform unstructured control flow graphs into structured graphs. We demonstrate the correctness of our algorithms and show that we outperform both the leading industry and academic decompilers: Hex-Rays and Phoenix. We use the GNU coreutils suite of utilities as a benchmark. Apart from reducing the number of goto statements to zero, DREAM also produced more compact code (less lines of code) for 72.7% of decompiled functions compared to Hex-Rays and 98.8% compared to Phoenix. We also present a comparison of Hex-Rays and DREAM when decompiling three samples from Cridex, ZeusP2P, and SpyEye malware families
    corecore